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1. INTRODUCTION 

The level of poverty status during the coronavirus disease (COVID-19) pandemic has increased in 
line with the decline in the global economy [1]. This also has a major impact on several developing countries 
[2]. Indonesia is also one of the countries that have had the impact of the pandemic, which has caused a high 
movement in the poverty rate in recent times [3]. The impact is explained that the movement of the poverty 
rate can be seen by the population density and the level of community economic income [4]. To overcome 
these problems, several studies have been carried out in an effort to overcome the problem of poverty. One 
form of application for handling and controlling poverty cases can be seen in the classification process. This 
process is carried out to be used as recommendations as well as control and monitoring for the government in 
managing the community's economy. 

The classification process has developed by producing a model to be used as a solution to solving a 
problem. Artificial intelligence (AI) is a concept that is widely used in the classification process. AI is a 
multidisciplinary prospective study of a branch of science that has many opportunities in research and provides 
results as an impetus for the development of knowledge [5]. The development of AI also has a major impact on 
the economy so it has the potential to increase productivity growth [6]. AI can also be applied in financial analysis 
in the form of an algorithm developed [7]. 
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The application of AI in classified problems covers a variety of problems. Previous research explained 
that the classification process using machine learning (ML) provides a model and causes the emergence of poverty 
status based on household income and expenditure [8]. Furthermore, the same study explains that the classification 
of poverty status using the support vector machine (SVM) provides a classification model with an accuracy of 
71.93% [9]. The classification process is also able to overcome the level of poverty by measuring indicator 
variables that have a significant influence [10]. The ML approach produces a fairly good performance and level 
of accuracy in classifying poverty status [11]. A data mining approach is used to analyze poverty data based on 
the community based monitoring system (CBMS) database [12]. The development of the classification process 
as well as the existing model is also seen in the comparison of the performance of various algorithms for 
classification [13]. The system is able to process 14 levels of poverty in developing countries [14]. 

Based on the previous explanation, this study proposes a predictive analysis model and classification 
of poverty status levels. The model was developed using the deep learning (DL) approach in unsupervised and 
supervised learning. Unsupervised adopt K-means learning clusters to generate classification patterns. 
Supervised learning uses the artificial neural network (ANN) method and the SVM which is optimized by the 
pearson correlation (PC) method in measuring the level of prediction and classification results. The K-means 
method is able to provide analysis process performance based on patterns from data that have the same group 
[15]. The same explanation also explains that K-mean cluster is an unsupervised method that works efficiently 
in grouping big data [16]. To optimize K-means, the sum of squared error (SSE) method can to validate the 
cluster results [17], [18]. The K-means algorithm has been developed and used as a form of data grouping 
process on various problems [19]. Clustering is also used with the aim of running data and grouping it to get 
clustering optimization [20]. 

Supervised learning is also one of the processes in conducting prediction analysis and classification of 
poverty status. Basically, supervised learning is used in solving classification problems [21]. This learning concept 
has a label on the data set used [22]. The learning methods used include ANN and SVM. The ANN method is 
developing so fast along with previous research in the classification process to produce an optimal solution [23]. 
The application of ANN also helps a lot and solves a big problem [24]. The same explanation also explains that 
ANN is a method capable of developing a classification model with a fairly good performance and level of 
accuracy [25]. The performance of ANN is able to present a very good performance in the prediction process 
[26], [27]. In addition to ANN, the SVM method is also an alternative method for conducting classification 
analysis. SVM is a method that is able to solve problems with non-linear data using several kernels [28]. SVM 
can be recommended to carry out the classification process with a fairly good output [29]. The SVM method is 
able to show optimal performance based on generalization compared to conventional methods [30]. The 
application of SVM is also able to classify the poverty level based on the attribute data used with an accuracy of 
71.93% [31]. 

Based on this explanation, this study presents an update in the development of predictive and 
classification analysis models using the DL approach. The novelty is presented in the form of a measurement 
process for the output obtained. The analysis model presents a systematic pre-processing process to provide an 
appropriate analysis pattern based on the clustering of the data set used. The pattern of analysis is continued in 
the DL learning process using the ANN and SVM methods. Tests on the analysis pattern are carried out for 
validation between outputs and indicators in the prediction and classification process. The overall results of this 
research will make a major contribution to the development of knowledge and can be used as a form of control 
and handling for the government on the problem of poverty. 


2. RESEARCH METHOD 

Predictive analysis and classification of poverty status levels are described in several stages of the 
research framework. A quantitative approach is used to perform mathematical calculations on a dataset to 
obtain the desired results. This research framework presents an overview in the form of a predictive and 
classification analysis model which was developed with several stages starting from analysis, pre-processing, 
as well as the prediction and classification process. The analysis stage is carried out to determine indicators 
that can be used as variables in the analysis process. Indicators based on field survey results are reproduced 
using the regression method to measure weaknesses and strengths. After the variables are obtained, it is 
continued at the pre-processing stage using K-means to produce a classification pattern. The results of the 
K-means cluster process can divide the categories of poverty status to form a prediction and classification 
pattern. The pattern will then be reused to perform prediction analysis and classification using the ANN and 
SVM methods optimized by the PC method. This optimization is used as a measurement and validation of the 
analysis process so that the output has the maximum level of accuracy. The overall results of this analysis are 
optimal models for predicting and classifying poverty levels. The stages of the research can be seen in the 
framework of the prediction and classification analysis model presented in Figure 1. 
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Figure 1. Predictive analysis and classification model framework 
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2.1. K-means cluster 

K-means cluster is a method used in grouping data [32], [33]. Basically, the cluster concept is one of 
the techniques in which there is unsupervised learning in data mining by grouping data features that have 
similar features [34]. Among them are i) determining the number of clusters, ii) selecting the initial centroid 
randomly according to the number of existing clusters, iii) calculating the distance from the data to the centroid 
using the Euclidean distance formula, iv) updating the centroid by calculating the average value of each cluster, 
and v) return to step 3 if there is still data that has moved clusters or the centroid value has changed [35]. The 
calculation of the centroid distance can be expressed as (1) [36]. 


De = (Mix, as Ciz)? + (Miy = Cry)? (1) 


D, is the Euclidean distance value to measure the distance from each centroid. Mix, M iy 1s a value for the 


coordinates of the object in the data. Cj, : Ciy is the value of the center coordinate of the centroid. 


2.2. Artificial neural network 

ANN is one method that can be used in the classification process by presenting a fairly good level of 
performance [37]. ANN performance gives maximum results in dealing with problems such as classification 
and prediction [38], [39]. Based on the concept that has been explained that ANN can do learning by adopting 
a mathematical calculation process [40]. Learning is capable of a model that is applied in the form of an 
algorithm to produce decisions [41]. The model is presented in an architectural pattern based on the input layer, 
hidden layer, and output layer [42]. Overall, the ANN concept aims to provide optimal output from the learning 
process carried out [43]. 
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2.3. Support vector machine 

SVMs are a technique found in ML that can carry out the classification process and other learning 
activities [44]. A SVM is a concept that performs classification based on hyperplane to describe points in space 
that have been categorized [45]. SVM is a discriminatory classifier and is formally characterized by an optimal 
hyperplane [46]. This hyperplane is a dividing line between data segments, where each data segment will be 
placed on both sides. For example, multiple row data classification has been carried out with two different data 
sets [47]. 


2.4. Pearson correlation 

PC is a Statistical concept capable of performing calculations in a measurement process [48]. PC can 
be combined with several methods to give better results [49]. PC-based techniques can also be used to select 
optimized features in reviewing output from a model [50]. The PC calculation can be expressed as (2) [51]. 


xy = ORY) _ Eww) 
oXoY oXoY 


E(X,Y)-E(X)E(Y) 


™ J E(X2-E2(X)/E(Y?2-E2(Y) 2) 


The value of cov(X, Y) is the covariance between X and Y. The value of X, Y is the value of the standard 
deviation of the variables X and Y. The value of E(X) is the expected value of X. 


3. RESULTS AND DISCUSSION 

The process of prediction analysis and classification of poverty status using the DL approach begins 
with data analysis. Dataset analysis is sourced from the Central Statistics Agency of West Sumatra Province in 
2020 and 2021. The data analysis process aims to determine the parameters and indicators used in the prediction 
and classification process. The dataset analysis can be seen in Table 1. 


Table 1. Dataset analysis 


West Sumatra Poverty Status Indicator 2020 West Sumatra Poverty Status Indicator 2021 
Total Poverty Poverty Total Poverty Poverty 
: Income j Income 
population rate percentage population rate percentage 
90,373 12.990 2.126.15 14.37 92,021 13.220 2.326.01 14.37 
460,716 34.920 2.422.10 7.58 463,923 36.510 2.232.85 7.87 
371,105 32.890 2.554.31 8.86 373,414 29.740 2.422.38 7.96 
233,810 16.550 2.107.61 7.08 237,376 16.650 2.411.37 7.01 
347,407 18.480 2.607.63 5.32 348,219 16.200 2.763.59 4.65 
413,272 33.200 2.234.52 8.03 415,613 29.480 2.268.98 7.09 
487,914 32.920 2.785.18 6.75 491,282 33.100 2.339.39 6.74 
379,514 26.470 2.254.91 6.97 382,817 26.640 2.125.74 6.96 
278,480 20.310 1.894.51 7.29 281,211 20.220 2.464.50 7.19 
168,411 11.850 2.587.11 7.04 171,075 12.490 2.537.75 7.30 
241,571 15.420 2.896.95 6.38 247,579 15.490 2.375.03 6.26 
435,612 31.830 2.220.30 7.31 443,722 31.530 2.443.75 TAL 
939,112 44.040 2.836.98 4.69 950,871 42.440 3.278.68 4.46 
69,776 2.290 2.971.92 3.28 71,010 2.290 3.019.50 3:22 
61,898 1.480 2.766.77 2.39 62,524 1.350 2.510.62 2.16 
52,994 3.110 2.723.80 5.87 53,693 3.000 3.188.46 5.59 
128,783 6.320 2.744.42 4.91 130,773 6.000 2.838.49 4.59 
133,703 7.690 2.969.86 5.75 135,573 7.680 2.913.36 5.66 
87,626 4.400 2.661.84 5.02 88,501 4.200 2.519.09 4.75 


Table | explains that the indicators of the analysis process use the variables of population (X1), poverty 
rate (X2), (X3), and poverty percentage (X4). After the data analysis process is carried out, the analysis process 
is continued at the analysis pre-processing stage to find prediction patterns and classifications using the 
unsupervised learning concept. The results of the K-means cluster process using Weka Software can be seen 
in Figure 2. 

Figure 2 explains that the pre-processing analysis using K-means gives a fairly good result in 
describing the prediction and classification patterns. This result is quite significant based on the visualization 
of the data clusters formed based on the data groups in Table 1. The cluster results provide results with 3 levels 
of poverty status, namely cluster 2 (high status), cluster | (medium status), and cluster 0 (low status). With 
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these results, the process of predictive analysis and classification can be carried out to achieve maximum 
results. The predictive analysis process is carried out by adopting supervised learning. The method used is an 
ANN. The ANN method works to carry out the analysis process by the concept of human thinking [52]. The 
network architecture model can be seen in Figure 3. 
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Figure 2. K-Means cluster process results 
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Figure 3 is a form of ANN network architectural model design in the prediction process. The input 
layer consists of 4 layers based on predetermined variable indicators. The hidden layer has a multilayer shape 
and consists of several layers. The output layer consists of one layer to describe the prediction results obtained. 
The algorithm used in training and testing the ANN network is using the feedforward algorithm. This algorithm 
is able to provide an analytical model with a fairly good level of accuracy and mean square error (MSE) value 
in the prediction process [53]. The results of the training and testing of ANN can be seen in the Table 2. 


Table 2. Results of ANN training and testing 


Single hidden layer 
: Trainin Testin: 
Neem niecnte Accuracy MSE : Performance Accuracy MSE : Performance 
(4-4-1) 99.981000 0.019000 0.000190 99.925800 0.074200 0.003900 
(4-6-1) 99.981000 0.019000 0.001000 99.901200 0.098800 0.005200 
(4-8-1) 97.350300 2.649700 0.026497 99.937300 0.062700 0.003300 
(4-10-1) 99.981000 0.019000 0.001000 99.862300 0.137700 0.007200 
(4-11-1) 99.981000 0.019000 0.001000 99.930500 0.069500 0.003700 
Multi hidden layer 
; Trainin Testin: 
Sill ta Accuracy MSE : Performance Accuracy MSE . Performance 
(4-4-4-1) 99.969100 0.030900 0.0016 99.969100 0.030900 0.001600 
(4-8-4-1) 99.983900 0.016100 0.000845 99.983900 0.016100 0.000845 
(4-4-4-4-1) 99.880100 0.119900 0.006300 99.926900 0.073100 0.003800 
(4-4-8-4-1) 99.973500 0.026500 0.001400 99.973500 0.026500 0.001400 
(4-10-8-4-1) 99.835800 0.164200 0.008600 99.923000 0.077000 0.004100 


Table 2 describes the results of training and testing of several network architecture patterns in the 
prediction process. The network architecture used adopts single layer forms such as 4-4-1 (4 input layers, 4 
hidden layers, and 1 output layer) and multilayers such as 4-4-4-1 (4 input layers, 4 hidden layers_1, 4 hidden 
layers_2, and 1 output layer). The network architecture with the best performance is obtained in the 4-8-4-1 
pattern with an accuracy rate of 99.98%. The results of the ANN analysis process in making predictions can 
be seen in Figure 4. 


Gradient = 0.018664, at epoch 3 


< 10" 
2 
no] 
@ 
Lo) 
1072 
Mu = 1e-06, at epoch 3 
“ 
E 
10°6 
, Validation Checks = 1, at epoch 3 
= 
= 0.5 
i] 
> 


4 08 1 15 4 28 3 


3 Epochs 


Figure 4. Results of ANN analysis process 
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Figure 4 explains that the results of the analysis process look quite significant based on the output 
graph presented. After the results of the prediction process using ANN, the analysis process is continued to 
carry out the classification process using the SVM method. This method is a method used on linear and 
nonlinear data. The results of the SVM analysis can be seen in Figure 5. 
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Figure 5. SVM analysis results 


Figure 5 explains that the SVM analysis process presents a fairly good classification result in 
classifying poverty status. After the prediction analysis and classification process is carried out, then the 
process is continued for the validation process using the PC method. This method is a statistical method used 
to see the relationship between regression and correlation of each indicator variable to the resulting output. The 
results of the validation process using a PC can be seen in the Table 3. 


Table 3. Results of regression and correlation test process 
Model summary 


Model R R Adjusted R Std. error of the Change statistics 
Square Square estimate R Square FChange fl df2 Sig. F 
1 .994* .988 984 165.464.736 .988 283.078 4 14 .000 
a. Predictors: (constant), X4, X), X3, X2 
Correlations 
Control variables X) X X3 X4 Y¥. 
-none-" X) Correlation 1.000 O17 .013 .002 904 

Xd Correlation O17 1.000 -.305 .002 993 

X3 Correlation 013 -.305 1.000 -.501 -.312 

X4 Correlation .002 .002 -.501 1.000 ei ig 

Y Correlation 904 993 -.312 300 1.000 


a. Cells contain zero-order (pearson) correlations. 


Table 3 explains that the regression test process on the ANN pattern in making predictions produces 
a result of 98.8%. These results prove that the indicator variables used have a relationship with each other. The 
correlation results presented also provide significant results based on the indicator correlation test process with 
the resulting prediction output. The results of the correlation test of the population indicator (X;) have a 
correlation of 90.4%, poverty data (X2) is 99.3%, labor wages (X3) are 31.2%, and poverty percentage (X4) 
27.7% affects the predicted output (Y). Based on the results of the regression and correlation test validation, 
the prediction analysis model and classification of poverty status levels provides output with a fairly good level 
of accuracy. 

Based on the discussion that has been carried out, this study provides an updated analytical model for 
the prediction and classification process. The update is presented in the validation measurement process of the 
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analysis pattern and the resulting output. This measurement can test the entire analysis process carried out so 
that the output obtained provides a much better level of accuracy. Furthermore, the output of this research also 
presents the novelty of the concept of systematic analysis. The concepts and methods used have been able to 
be adopted quite well in DL learning to provide significant results. With this, the proposed analytical model is 
expected to provide a precise and accurate presentation of the analysis process so that the output of the analysis 
results can be taken into consideration by related parties in making decisions. 


4. CONCLUSION 

The development of the analytical model proposed in the prediction and classification process using 
DL provides fairly good performance with a validation level of 99.8% on the analysis pattern and output. These 
results prove that the development of the DL analysis model can present a structured model based on the stages 
of the process that have been carried out. This development is presented at the pre-processing stage which has 
a major contribution to forming a precise and accurate analysis pattern. Overall, the results of this study have 
an updated analytical model presented in the validation process to measure the accuracy of the results and 
provide an improvement in the analysis process. Based on these findings, prediction analysis and classification 
models can be used as a form of control and handling of poverty problems. 
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